Publish or perish: analysis of scientific productivity using maximum entropy principle 

and fluctuation-dissipation theorem 
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Using data retrieved from the INSPEC database we have quantitatively discussed a few syndromes 
of the publish-or-perish phenomenon, including continuous growth of rate of scientific productivity, 
and continuously decreasing percentage of those scientists who stay in science for a long time. 
Making use of the maximum entropy principle and fluctuation-dissipation theorem, we have shown 
that the observed fat-tailed distributions of the total number of papers x authored by scientists 
may result from the density of states function g(x;r) underlying scientific community. Although 
different generations of scientists are characterized by different productivity patterns, the function 
g(x;r) is inherent to researchers of a given seniority r, whereas the publish-or-perish phenomenon 
is caused only by an external field 6 influencing researchers. 

PACS numbers: 87.23.Ge, 89.75.-k, 89.70,+c 



I. INTRODUCTION 

Nowadays, (. . . ) Evaluations of scientists depend on 
number of papers, positions in lists of authors, and jour- 
nals ' impact factors. In Japan, Spain and elsewhere, 
such assessments have reached formulaic precision. But 
bureaucrats are not only wholly responsible for these 
changes - we scientists have enthusiastically colluded. 
What began as someone else's measure has become our 
(own) goal.(...) 0. In fact, a number of scientists all 
over the world alter that research is in crisis. Academics 
are having to publish-or-perish. Scientific articles become 
a valuable commodity both for authors and publishers 
0- The politics of publication does not only concen- 
trate on publishing as valuable articles as possible. Of 
course, since articles in leading journals certifies one's 
membership in the scientific elite the impact factor of 
journals matters but also the total number of publica- 
tions is of great importance since frequent publications 
allow to sustain one's career, and are well seen when ap- 
plying for funds. Authors have to plan when, how and 
with whom to publish their results. Quoting Lawrence 
0: The ideal time is when a piece of research is fin- 
ished and can carry a convincing message, but in reality 
it is often submitted at the earliest possible moment. (. . . ) 
Findings are sliced as thin as salami and submitted to dif- 
ferent journals to produce more papers. Scientists, who 
are aware of the publish-or-perish phenomenon warn that 
research professionalism may be sacrificed in the pursuit 
of research grants and fame, or simply for fear of loss of 
a position. 

In this paper, using data retrieved from the INSPEC 
database, we quantitatively analyze two syndromes of 
the publish-or-perish phenomenon: continuous growth of 
rate of scientific productivity and continuously decreas- 
ing percentage of those scientists who stay in science for 
a long time. 

The paper is organized as follows. In the next section 
we start with a simple examination of scientific produc- 



tivity distributions for all INSPEC authors together, as 
it was done by Lotka Q and Shockley Q. Then, we 
study temporal evolution of the scientists. From the 
whole database we draw long-life scientists, i.e. scien- 
tists who were doing research for at least 18 years. Hav- 
ing such a set of scientists we divide it into the so-called 
cohorts including those who started to publish in a given 
year T (i.e. T = 1975, 1976, 1987). We show 

that unlike quickly increasing number of all authors listed 
in the INSPEC database the number of long-life scien- 
tists, as characterized by year of the first publication T, 
remains almost constant indicating decreasing percent- 
age of long-life scientists among all researchers. We also 
show that histograms of scientific productivity N(x; t, T) 
within T-cohorts, measured by the number of articles x, 
change over time t from almost exponential (when cohort 
contains young scientists) to clearly fat-tailed (when the 
same cohort includes mature researchers). Additionally, 
we observe that the number of articles produced by a rep- 
resentative of each cohort increases with the square of se- 
niority t = t— T i.e. (x) ~ r 2 , indicating that each cohort 
possesses fixed acceleration parameter a(T) — d 2 (x)/dr 2 
which, on its own turn, quickly increases with T. Fi- 
nally, in Sec. Ill, we analyze the observed distributions 
of scientific productivity in terms of equilibrium statis- 
tical physics. We show that the fat-tailed histograms 
N{x; t, T) may result from the inherent density of states 
function g{x; r) characterizing scientific community. We 
also introduce the parameter 9(t, T), which has a similar 
meaning as the inverse temperature (3 in the canonical 
ensemble, and describes an external field influencing sci- 
entists. The parameter allow us to quantify the effect of 
publish-or-perish phenomenon. 
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FIG. 1: The figure explains the procedure used in order to 
retrieve long-life scientists. We assume that an author belongs 
to the T— cohort if the period of time that passed between 
his/her first and last publication fulfills the relation Tf —T > 
17, where Tf is the year of the last publication indexed in 
our data set. According to the procedure only the first two 
authors, whose publication history is depicted in the figure, 
are considered to be long-life T-scientists. 



II. SCIENTIFIC PRODUCTIVITY 
FUNDAMENTAL RESULTS 



In this study we report on scientific productivity of all 
authors (over 3 million) listed in the INSPEC database 
5] in the period of 1969 - 2004. The database, pro- 
duced by the Institution of Electrical Engineers, provides 
a few million of records indexing scientific articles pub- 
lished world-wide in physics, electrical engineering and 
electronics, computing and information technology. Al- 
though each INSPEC record contains a number of fields 
(including publication title, classification codes etc.) for 
our purposes we have retrieved only two of them: au- 
thors' names (i.e. names with all initials) and publi- 
cation year. Having the data we were able to discover 
the initial year of one's scientific activity T (i.e. year 
of the first publication) and also the cumulative number 
of his/her publications in the next years. Additionally, 
from the whole data set we have drawn long-life scien- 
tists (i.e. scientists who were productive for at least 18 
years, see Fig.QJ, and we have divided them into the so- 
called T— cohorts, with T having the same meaning as 
previously. 

A few important findings on evolution of scientific com- 
munity can be immediately drawn from the simple com- 
parison of the number of all T-authors and the number 
of those authors who turned out to be long-life scientists. 
However, before we discuss how the numbers and their 
ratio depend on T, two limitations of our data should be 
noted. First, since the INSPEC database does not con- 
tain information about articles published before 1969, the 
initial year of scientific activity T for scientists indexed 
in the database in early seventies may be incorrect. That 
is why, for further analysis we have restricted ourselves 
to the period starting at T = 1975. Second, due to the 
the criterion of 18 years of activity, taken when specify- 
ing T— cohorts, the number of cohorts is limited to 13, 
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FIG. 2: Number of all authors listed in the INSPEC database 
and the number of long-life scientists versus the year of the 
first publication T. 



respectively for T = 1975, 1976, . . . , 1987. Keeping in 
mind the mentioned constraints one can see (Fig. [2J that 
although the number of all authors listed in the INSPEC 
database increases every year, the number of long-life 
scientists remains almost constant (the downward trend 
observed in eighties should not be taken into account as 
it may result from finite-size effects due to reduction of 
the period between T+17 and 2004; consider the case of 
the 2nd Author in Fig. . The chief conclusion result- 
ing from the above observations is that the percentage 
of long-life scientists among all scientists monotonically 
decreases in time (see inset in Fig. 0). 

In the rest of the section we will concentrate on the 
fundamental features of distributions describing scien- 
tific productivity of authors indexed in INSPEC. As a 
matter of fact, scientific productivity, measured by the 
number of papers authored, has a long history of study 
in socio- and bibliometrics, with the articles by Lotka [3| 
and Shockley [4| being famous early examples. Both of 
these authors found that the number of papers produced 
by scientists has a fat-tailed distribution, exhibiting both 
a large number of authors who contributed only a few 
articles, and a small number of authors who made a very 
large number of contributions. Being more precise, Lotka 
(1926) studied a sample of 6891 authors listed in Chem- 
ical Abstracts during the period of 1907 — 1916 finding 
that the number of authors making x publications was 
described by a power law 



N(x) 



(1) 



with 7 ~ 2, whereas Shockley (1957) investigated sci- 
entific productivity of 88 research staff members at the 
Brookhaven National Laboratory in the USA finding log- 
normal distribution 

N(x) 1 e -(lnx-m) 2 /(2, 2 ), (2) 



2ttx 



In Fig. we have shown on logarithmic scales his- 
tograms of the number of papers written by: all au- 
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FIG. 3: Histograms of the number of papers written by: all authors in INSPEC (solid squares) and long-life scientists in the 
database (open squares). Solid lines represent fits to the data as described in the text: log-normal distribution (gray line) with 
m — 0.43 ± 0.01 and s — 1.69 ± 0.01, and distribution composed of two power laws (black lines) one for small and intermediate 
events (7 = 1.67 ± 0.01) and the other for extreme events (7 = 2.87 ± 0.03). 



thors listed in INSPEC and all long-life scientists in the 
database. As expected, both distributions are highly 
skewed, and their fat-tails are due to long-life scientists. 
One can also see that the distribution of all authors re- 
gardless of their seniority is well described by the log- 
normal distribution 121), which for reasons elaborated by 
Sornette and Cont [g (see also mav be confused 

with distribution having power law tail JTJ. In the Fig.|3 
apart of the log-normal fit to our data, we have shown 
that distribution composed of two power laws also fits 
our data very well. Nevertheless, the exponents 7 for 
both regions of the power law scaling significantly differ 
from the exponent 7 ~ 2 predicted by Lotka. 

The reported studies show that scientists differ enor- 
mously in the number of papers they publish. Although, 
at present the fat-tailed distributions are not so surpris- 
ing for physicists as they were 20 years ago, the appear- 
ance of highly skewed distributions characterizing scien- 
tific productivity is still strange since it refers to scientific 
elite who undergone a rigorous selection procedure and 
is expected to be more homogeneous. At the moment, 
one may for example suggest that the noticed differences 
between scientists may result from the heterogeneity of 
the analysed sample (fi.s. as is the case in nonextensivity 
driven by fluctuations [H,E3)- To be ahead of these sug- 
gestions, in the following we will concentrate on analysis 
of T-cohorts, as they were characterized at the beginning 
of this section. Although, the approach makes our data 
more homogeneous, we are aware that it still does not 
take into account other factors which influence scientific 
productivity (e.g. access to resources which facilitate re- 
search or geopolitical conditions). In the next section we 
will try to convince the readership that the effect of those 
omitted factors may be understood in terms of a single 
function having the same meaning as density of states in 



equilibrium statistical physics. 

Due to our approach, whatever differences are observed 
among T— scientists they can be logically decomposed 
into only two sorts: (i) life-course differences, which are 
the effects of biological and social aging, and (ii) cohort 
differences, which are differences between cohorts at com- 
parable points in career history. According to our knowl- 
edge the only similar analysis of scienti fic p roductivity 
was performed by Allison and Stewart |ll| . who anal- 
ysed a sample of U.S. scientists in university departments 
offering advanced degrees in biology, chemistry, physics 
and mathematics. The authors divided the sample into 
8 age strata by the number of years since Ph.D., repre- 
senting different cohorts at different points during their 
career history. Unfortunately, lacking longitudinal data 
the authors were only able to observe life-course differ- 
ences among scientists, assuming that cohort differences 
are negligible. 



T-cohort 


a 


b 


A 


B 


C 


E 


T\ 


1975 


0.025 


0.39 


0.06 


- 1.02 


2.86 


0.48 


-7.24 


1977 


0.028 


0.40 


0.03 


-1.47 


3.09 


0.86 


-7.49 


1979 


0.035 


0.37 


0.06 


-0.97 


3.00 


0.58 


-5.38 


1981 


0.048 


0.36 


0.01 


-2.15 


3.50 


3.20 


-4.60 


1983 


0.055 


0.39 


0.01 


-2.38 


3.63 


3.37 


-4.53 


1985 


0.066 


0.42 


0.04 


-1.38 


3.26 


1.31 


-3.64 


1987 


0.119 


0.35 


0.07 


-1.36 


3.25 


1.36 


-1.80 



TABLE I: Values of parameters a, b, A, B, C, E, t\ for a few 
T-cohorts. See Eqs. ©, @, and (frit . 



In Fig. 0] we have presented how the histogram of sci- 
entific productivity N(x; t, T) depends on time t as a T- 
cohort ages. In general, the scenario is the same for all 
analysed T-cohorts: N(x; t, T) changes from almost ex- 
ponential (when a cohort contains young scientists) to 
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g' number of publications - x number of publications - x 

FIG. 4: Histograms of scientific productivity N(x; t, T) characterizing cohorts of long-life scientists, who started to publish in 
a given year T = 1975 or 1985, and r = t — T — 6, 12, 18. (Detailed description of the figure is given in the text.) 



clearly fat-tailed (when the same cohort consists of ma- 
ture researchers). The results exemplify life-course differ- 
ences among long-life scientists, and in some sense con- 
firm the so-called hypothesis of accumulative advantage 
[Tl|. which claims that due to a variety social and other 
mechanisms productive scientists are likely to be even 
more productive in the future, whereas those who pro- 
duce little original work are likely to decline further in 
their productivity. 

In order to examine cohort differences we have anal- 
ysed how the average (x) and the variance (x 2 ) — (x) 2 of 
the distribution N[x; t,T) depend on the cohort param- 
eter T = 1975, . . . , 1987, and how they change over time 
t. We have found that the parameters are well-defined 
increasing functions of time (see Fig. [5J 



8t 



and 



= A(t-B) c 



(3) 



(4) 



where r = t— T and a, b, A, B, C depend onT (see Tab.HJ. 

At the moment, it is worth to mention that although 
our analysis encompasses only 18 initial years of cohorts' 
history, we have also verified the above relations for 28 
years of activity of the oldest 1975-cohort, finding excel- 
lent agreement with the results obtained for other cohorts 
and for the shorter period of time (see insets in Fig. [SJ • 
Nevertheless, one should be aware that even the most 
productive scientists in his/her declining years slow down 
pace of working. According to Zhao [T^], the optimal age 
for scientific productivity is between 25 and 45, reaching 
the peak for researchers around 37 (i.e. about 18 years 
since the beginning of the career). Similar findings has 
been also reported by Kyvik |13| . who found that pub- 
lishing activity reaches a peak in the 45 — 49-year-old 



age group and declines by about 30% among researchers 
over 60 years old. Summing up, in the light of previ- 
ous results on the relation between age and productivity, 
findings reported in our paper apply to scientists in the 
most prolific period of their career. 

Now, let us briefly comment on the relations © and 
(0J. First, note that the linear dependence on seniority r 
in Eq. implies that an average representative of each 
cohort possesses an acceleration parameter a, which is 
fixed during the whole scientific career. Moreover, the 
parameter increases with T (cf. Tab. [I] and Fig. El, cer- 
tifying that younger (in terms of T) scientists are better 
skilled to produce more papers than their older colleagues 
at the same point of the scientific career. It is a matter 
of debate whether the differences in a are due to better 
adaptation of young people to technological achievements 
(i.e. computers and the Internet), or they result from 
the rough competition between researchers, and are one 
of syndromes of the publish-or-perish phenomenon. In 
the next section, exploiting relations J3J and J2J, we will 
show that regardless of the reasoning the explanation of 
accelerated productivity naturally emerges as a result of 
treatment of the scientific community by means of meth- 
ods borrowed from equilibrium statistical physics. 



III. THEORETICAL APPROACH TO 
SCIENTIFIC PRODUCTIVITY - DENSITY OF 
STATES UNDERLYING SCIENTIFIC 
COMMUNITY 

In sociometrics, explanations of highly skewed his- 
tograms of scientific productivity N(x) (see Fig. [SJ arc 
generally of two (not necessarily exclusive) types [l4| . 
The sacred spark (i.e. heterogeneity) hypothesis says that 
the observed discrepancies in scientific productivity orig- 
inate in substantial, predominated differences among sci- 
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FIG. 5: Change of the average productivity d(x) /dr, and the variance (x 2 ) — (x) 2 of cohorts' productivity distributions N(x; t, T) 
versus seniority r = t — T. Points represent real data retrieved from the INSPEC database, whereas solid lined express numerical 
fits according to Eqs. and Q. (Detailed description of the figure is given in the text.) 
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FIG. 6: Acceleration parameter a and initial velocity b versus 
cohort parameter T. As previously, points represent data 
retrieved from INSPEC, whereas solid lines express trend in 
the data. 

entists in their ability and motivation to do creative re- 
search, while the accumulative advantage (i.e. reinforce- 
ment) hypothesis 0, 0] claims that due to a variety 
of social and other mechanisms productive scientists are 
likely to be even more productive in the future. Accord- 
ing to the first hypothesis, skewed distributions of hid- 
den attributes characterizing scientists naturally lead to 
skewed distribution of productivity, whereas the second 
hypothesis argues that the observed fat-tailed histogram 
N(x) results from sophisticated stochastic processes un- 
derlying scientific productivity (see e.g. 0,0])- 

In this section we will present an alternative expla- 
nation of the skewed productivity distributions. Since 
we have already noticed that the fat-tail of the distri- 
bution P{x) = N(x)/N characterizing the set of all au- 
thors listed in INSPEC is due to long-life scientists (c.f. 
Fig. [21, in the following we shall only concentrate on 
distributions P(x; t, T) = N(x; t, T) /N(T) characterizing 



T— cohorts (see Fig.0J). In order to describe the scientific 
commu nity , we will exploit the maximum entropy princi- 
ple |17l fl8| , and we will adopt some of the fundamental 
concepts from equilibrium statistical mechanics (like sta- 
tistical ensemble, phase space, and density of states) . Wc 
will also argue, that our approach does not contradict the 
sociological hypothesis mentioned at the beginning of the 
section. 

In physics, the notion of statistical ensemble means a 
very large number of mental copies of the same system 
taken all at once, each of which representing a possible 
state that the real system might be in. When the en- 
semble is properly chosen it should satisfy the ergodicity 
condition, which guarantees that the average of a thermo- 
dynamic quantity across the members of the ensemble is 
the same as the time-average of the quantity for a single 
system. 

In our approach we will identify a representative of a 
given T-cohort with a physical system, and we will try to 
describe such a system (i.e. a long-life scientist) in terms 
of statistical physics. Since (at least now) we do not have 
access to parallel worlds, in our approach a large group of 
copies of the same scientist will be replaced with a large 
set of macroscopically similar long-life scientists, i.e. sci- 
entists belonging to the same T-cohort, and taken at a 
given point in their scientific career r = t — T '. Here, the 
assumption of macroscopic similarity means that the con- 
sidered scientists are exposed to the same external field 
(influence) 8(t,T), which forces (motivates) scientists to 
publish an average number of publications (a;) {t, T). The 
external field (influence) 9 has the same meaning as the 
inverse temperature f3 = (fcT) -1 which determines the 
average energy (E) in the canonical ensemble 19] . 

Now, suppose that one would like to establish proba- 
bility distribution P(O) over a given T— cohort at time t, 
where 

to = {yi,V2,.-.,Vn} (5) 



6 



stands for states (i.e. microstates) of a single scientist, 
who belongs to the considered cohort /ensemble. (Let 
us explain that the parameters yi are coordinates of a 
hidden phase space underlying the scientific community, 
and determining scientific productivity 



x = x(Q) = x(y 1} y 2 , ■ ■ ■ ,y n )- 



(6) 



Of course, there exists a number of such parameters, in- 
cluding: research field, IQ level, age, number of cowork- 
ers, motivation, funds etc., but as it turns out in the rest 
of this section a few important findings about our ensem- 
bles may be obtained even without detailed knowledge on 
the parameters.) Due to the maximum entropy school of 
statistical physics initiated by Edwin T. Jaynes in 1957 
0,0], the best choice for the distribution P(O) is the 
one that maximizes the Shannon entropy 



S , = -^P(0)lnP(n), 



(7) 
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FIG. 7: Main stage: external field (influence) 6(t, T) versus 
seniority r = t - T for two cohorts T = 1975 and T = 1985. 
Subset: productivity parameter defined as k = 9~ l versus r 
for the same cohorts. 



subject to the constraint 

n 

plus the normalization condition 



(8) 



(9) 



The Lagrangian for the above problem is given by the 
below expression 



C = - ^P(ft)lnP(f7)+a(t,T)(l-^P(ft)) 
si n 

+ 0(t,T) ({x)(t,T)-J2x(n)P(Q)) 



(10) 



where the multipliers 9{t,T) (external field) and a(t,T) 
are to be determined by (JBJ and Differentiating C 
with respect to P(fi), and then equating the result to 
zero one gets the desired probability distribution over 
the T— cohort 



-9(t,T)x(Q) 

Z(t,T) 



(11) 



where Z(t,T) represents the partition function (normal- 
ization constant), and 



Z(t,T) =J2e~ e(t ' T)xm 



o«(t,T) + l 



(12) 



Before we proceed further, let us make two comments 
here. First, since each T— cohort changes over time t a 
sceptic may bring the validity of our equilibrium approach 
into question. In order to justify the approach we assume 
that time dependence of T-cohorts may be considered in 
terms of quasistatic equilibrium process. (Let us remind 



that in a quasistatic process, due to sufficiently slow dy- 
namics, a system is considered to cross from one equilib- 
rium state to another.) The assumption allow us to treat 
each T— cohort in separate years t > T as an equilib- 
rium system. The second comment relates to ergodicity 
of our ensembles. In statistical physics the ergodic hy- 
pothesis says that, over long periods of time, the time 
spent in some region of the phase space corresponding to 
microstates with the same energy is proportional to the 
volume of this region, i.e. that all accessible microstates 
n are equally probable over long period of time. Equiva- 
lently, the hypothesis says that time average and average 
over the statistical ensemble are the same. In the case of 
long-life scientists, we may only speculate about the un- 
derlying phase space, its dimensionality and coordinates 
iJSJ . Even if we were able to enumerate most of significant 
coordinates characterizing such scientists, surely a part 
of these coordinates, including e.g. motivation, would be 
impossible to quantify. Summarizing, given the above 
and other difficulties it appears impossible to verify the 
ergodic hypothesis for our ensembles, and the question - 
if ergodicity is fulfilled here - remains open. 

Now, having the theoretical framework we are in a po- 
sition to analyze how the external field 9(t,T) influenc- 
ing scientists depends on T, and how it changes over 
time t. In order to calculate the parameter we use the 
fluctuation-dissipation relation 



d(x) d{x) 



d8 



dr 



(13) 



(Keep in 
and also 



which may be simply derived from P(O) l|llfl 
mind that the ensemble averages (x) and (x 2 ] 
9 depend on both t and T.) At the moment, note that 
in the previous section we have already found empirical 
relations corresponding to both sides of the last formula. 
Inserting the relations © and (0J into l|13fl . after some 
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FIG. 8: Differences between cohorts. External field 8(t, T) 
coupled to the number of publications x versus the cohort 
parameter T for r = t — T = 9. The solid line stands for 
trend in the empirical data. 

algebra one obtains 

6{t - T) - - Lm^w* (14) 

= E(t- S) 1 - c (r-n) +L>, 

where parameters a, b, A, B, C, D depend on T, whereas 
E, T\ are functions of these parameters (see Tab. [|J. 

In Fig. we have presented how the external field 
9(t, T) changes over seniority r. Since the field conjugates 
to the cumulative number of publications, its decreasing 
character indicates that small values of the field corre- 
spond to large productivity, and vice versa - large fields 
induce small productivity. (The inverse of 9, i.e. k = 
stands for a productivity field which has more obvious so- 
ciological interpretation: larger k enforces larger number 
of papers. See inset in Fig.0) Having in mind the reverse 
relationship between 9 and the number of publications x, 
one can argue that the constant of integration D in 114|) 
must be equal to zero. The reasoning behind the state- 
ment is the following. Given that the considered long-life 
scientists never die, still being in the most prolific period 
of their career, one may simply imagine that in the limit 
of t ~ t — > oo the total number of publications produced 
by these scientists must approach infinity, what corre- 
sponds to 0(oo,T) = 0, and respectively D(T) = 0. 

The above results allow us to further investigate differ- 
ences between T-cohorts. Comparing values of the exter- 
nal field 6(t, T) influencing T-scientists at the same point 
t = t — T in their scientific career, one can show that the 
field is a decreasing function of T (see Fig.|SJ). (We have 
also checked that the decreasing character of 9{T + t, T) 
versus T holds for every value of r = 1,2, ...,18.) The 
above stems from the fact that younger (in terms of T) 
scientists publish more than their older colleagues at the 
same age. The interesting point here is that statistical 
physics allows to describe the phenomenon in terms of 
changing external field, which leads to accelerated pro- 
ductivity as described in the previous section. 



In order to finalize our theoretical approach to scien- 
tific productivity we should explain the mutual relation- 
ship between the theoretical distribution P(Q) <|11|) and 
the empirical distribution P(x;t,T) (see Fig. @J. Thus, 
since the two distributions apply to the same ensembles 
there should exist a possibility to cross from one distri- 
bution to the other. Such a possibility appears due to 
the density of states function g(x;t, T), which expresses 
the number of allowed states fl (cf. Eq. |SJ) that scientists 
may be in, given that the number of publications corre- 
sponding to these states equals x (JBJ. Using the concept 
of the density of states one can write 

P(x(n);t,T) = g(x;t,T)P({l), (15) 

and respectively the empirical function g(x; t, T), correct 
to the multiplicative factor Z(t, T), may be obtained from 
the below expression 

i^l=P(x;t,T)e^. (16) 

In Fig. we have presented how the empirical den- 
sity of states g(x; t, T) depends on x. The most striking 
feature about g(x; t, T) is that it does not depend sepa- 
rately on time t and T, but it depends on their difference 
t = t — T (cf. bunches of curves shown in the figure) 

g(x;t,T)=g(x;r). (17) 

The above means that the density of states is an inherent 
characteristic describing researchers of a given seniority 
t. It also certifies that the parameter 9(t, T) l|14fl has 
the meaning of an external field, which is only respon- 
sible for filling of corresponding states © in the hidden 
phase space underlying scientific community. The anal- 
ogy between our parameter 9 and the inverse temperature 
(3 in the canonical ensemble is indeed very close. Exter- 
nal conditions expressed by the field 9 do not change 
the considered system, which in our case corresponds to 
a scientist characterized by a given value of r. They 
only influence the probability (jl 1|> of realization of a state 
corresponding to a given productivity x ©■ In partic- 
ular, the findings allow us to say that representatives 
of younger cohorts usually coauthor much more articles 
than their counterparts (in terms of the same r) belong- 
ing to older cohorts. It means that due to external re- 
quirements (which we interpret as publish-or-perish phe- 
nomenon) representatives of younger cohorts are skilled 
(forced) to contribute more articles. 

Finally, before we proceed to conclusions let us briefly 
comment on the shape of the function g(x; t) (see Fig.|5J|. 
The function monotonically decreases for small and 
quickly increases for large values of x, having the char- 
acteristic minimum for intermediate x. One can argue 
that the corresponding curvature of g(x;r) may result 
from topological requirements imposed by the relation 
x(Q) on the hidden space ft = {yi, y%, . . . , y n } ijSJ. 
A simple but still reasonable example of such a relation 
is graphically presented in Fig. (Although the figure 
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FIG. 9: Density of states functions g(x;r) underlying different T— cohorts at different stages of their scientific career r. 




one can argue that it corresponds to the leading role of 
one selected motivator y^, and insignificant role of other 
parameters Uj^i- In some sense, such a naive thinking 
on factors influencing scientists is consistent with a com- 
mon experience stating that in early stages of career the 
only one factor makes motivation for scientific activity 
(e.g. satisfaction). Along with growing x other moti- 
vators start to play a role (e.g. recognition and being in 
power), what may be expressed by the mentioned convex- 
to-concave crossover. 



FIG. 10: Examples of phase trajectories x[fl) in the space 
of scientific motivators f2 = {2/1,2/2, . . . ,y n } resulting in the 
corresponding shape of g(x;r). (Detailed description of the 

figure is given in the text.) JV SUMMARY 



presents only two- and three-dimensional phase spaces 
the below reasoning also holds for higher dimensions.) 
In the figure, the direction of the dashed lines expresses 
growing number of publications x, whereas the area of 
the n— dimensional hypersurface is proportional to the 
number of states g(x; r) of a given value of x. As one can 
see, the hypersurfaces x(f2) corresponding to increasing 
values of x change from convex to concave. The feature 
leads to the minimum in the density of states function, 
and has a nice sociological interpretation. 

In order to outline the mentioned sociological interpre- 
tation, let us assume that all motivators yi influencing 
scientific productivity have some minimal values. Such 
an assumption seems to bee natural since one can not 
get salary lower than a certain limit, and it is impossible 
to possess negative number of coworkers. On the other 
hand, there are no upper limits for these parameters. 
We are not even in a position to guess their units. It fol- 
lows that for visualization purposes all motivators may 
be limited to their positive values, as shown in Fig. 1101 
Now, in order to justify the suggested convex character 
of the hypersurface x(Q) representing small values of x, 



In this paper we have attempted to provide a quan- 
titative approach to the publish-or-perish phenomenon, 
which refers to the pressure to constantly publish work in 
order to further or sustain one's scientific career. Using 
data retrieved from the INSPEC database we have quan- 
titatively discussed a few syndromes of the phenomenon, 
including continuous growth of rate of scientific produc- 
tivity, and continuously decreasing percentage of those 
scientists who stay in science for a long time. Methods 
of equilibrium statistical physics have been applied for 
the analysis. We have shown that the observed fat-tailed 
distributions of the total number of papers x authored by 
scientists may result from a specific shape of the density 
of states function g(x; r) underlying scientific commu- 
nity. We have also argued that although different gen- 
erations of scientists arc characterized by different pro- 
ductivity patterns, the function g(x;r) is inherent to re- 
searchers of a given seniority r, and the publish-or-perish 
phenomenon may be quantitatively characterized by the 
only one time- and generation- dependent parameter 9, 
which has the meaning of an external field influencing 
researchers. 
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